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Abstract 



Dropping tolerance criteria play a central role in adaptive and static F-norm min- 
imization based Sparse Approximate Inverse (SAI) preconditioning and Factorized 
SAI (FSAI) preconditioning. However, such criteria have received little attention and 
have been treated heuristically in such a way: If the size of an entry is below some 
empirically small quantity, then it is set as zero. The meaning of "small" is vague and 
f*^ ' loose and has not been considered rigorously and precisely. It has not been clear how 

£N| . dropping tolerances affect the quality and effectiveness of a preconditioner M. In this 

paper, we focus on the adaptive Power Sparse Approximate Inverse (PSAI) algorithm 
proposed by Jia and Zhu (Numer. Linear Algebra Appl., 16 (2009): 259-299) and 
establish a solid theory on robust selection criteria for it. Using the theory, one can 
select dropping criteria to guarantee the non-singularity of M and make it as sparse 



(**] ' as possible and in the meantime can make M have comparable quality to the possi- 
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bly much denser one obtained by PSAI without dropping. The theory is adapted to 
all the static F-norm minimization based SAI preconditioning procedures. Numeri- 
cal experiments are reported to confirm the theory and illustrate the robustness and 
effectiveness of selection criteria for dropping tolerances. 
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(N| ■ 1 Introduction 

Preconditioned Krylov subspace methods [32] are among the most popular iterative solvers 
for the large sparse linear system of equations 

ci3 ■ Ax = b, (1 

where A is a nonsingular and nonsymmetric (non-Hermitian) n x n matrix and b is an 
n-dimensional vector. Sparse approximate inverse (SAI) preconditioning is one class of 
important general-purpose preconditioning for Krylov solvers, which aims to construct 
sparse approximations of A~ l directly. There are two typical kinds of SAI preconditioning 
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approaches. One of them is to construct factorized sparse approximate inverses. An 
effective algorithm of this kind is the approximate inverse (AINV) algorithm, which is 
derived from the incomplete (bi) conjugation procedure [HE]. The other kind is based on F- 
norm minimization and is inherently parallelizable. It aims to construct a SAI M ~ A~ l by 
minimizing \\AM — I\\f for a specified pattern of M that is either prescribed in advance or 
determined adaptively, where \\-\\f denotes the F-norm of a matrix. A hybrid version of the 
two kinds, i.e., the factorized approximate inverse (FSAI) preconditioning based on F-norm 
minimization, has been introduced by Kolotilina and Yeremin|29j. who have proposed an 
FSAI algorithm. FSAI is then generalized to block form and called BFSAI in [23]. An 
adaptive algorithm in [23] is presented that generates automatically the nonzero pattern 
of the BFSAI preconditioner. In addition, the idea of F-norm minimization is generalized 
in [21] by introducing a sparse readily inverted target matrix T. M is then computed by 
minimizing \\AM — T\\f,h over a space of matrices with a prescribed sparsity pattern, 
where || • \\f,h is the generalized F-norm defined by ||-B||p^ = (B,B)f,h = trace(B T HB), 
where H is some symmetric (Hermitian) positive definite matrix and the superscript T 
denotes the transpose of a matrix or vector and it is replaced by the conjugate transpose if 
the matrix B is complex. A good comparison of factorized SAI and F-norm minimization 
based SAI preconditioning approaches can be found in [6]. Incomplete factorizations and 
SAIs have been shown to provide effective smoothers for multigrid; see, e.g., [TU 1 [TT | [M l [36] . 
For a comprehensive survey on preconditioning techniques, we refer the reader to [3]. 

In this paper, we focus on F-norm minimization based SAI preconditioning, where a 
central issue is to determine the sparsity pattern of M effectively. There has been much 
work on a-priori pattern prescriptions, see, e.g., |21 [12l [13j [22j [33] . Once the pattern of M 
or its envelop is given, the computation of M is straightforward by solving n independent 
least squares (LS) problems and M is then further sparsified generally. This is called a 
static SAI preconditioning procedure. Huckle [22] has compared different a-priori sparsity 
patterns and established effective upper bounds for the sparsity pattern of M obtained 
by the famous adaptive SPAI algorithm [191. He shows that the patterns of (A T A) k A T , 
(I + \A\ + \A |) A and (/ + A) for small k can be good envelop patterns of a good M. 
These predictions are very useful to reduce the communication time by distributing data 
of A to processors a-priori to compute M in a distributed parallel environment. 

For a general sparse matrix A, however, determining an effective sparsity pattern of 
A" 1 is nontrivial. A-priori patterns may not capture positions of large entries in A^ 1 
effectively when they are sparse. Or, they may capture the positions only when the 
patterns are quite and even unacceptably dense, so that the storage becomes a bottleneck 
and the construction of M's is impractically time consuming. To cope with this difficulty, 
a number of researchers have proposed adaptive strategies that start from a simple initial 
pattern and successively augment or adaptively adjust this pattern until M is satisfied 
with certain accuracy, i.e., \\AM — I\\ < e for some norm || • ||, where a given e is fairly 
small, or a maximum number of nonzero entries in M is reached. This idea was first 
proposed by Cosgrove et al. [16] and developed by Grote and Huckle [19], by Gould and 
Scott [18] and by Chow and Saad [15]. It appears that the SPAI preconditioning proposed 
by Grote and Huckle [19] is more robust than the others [3]. One of the key differences 
between these procedures is that they use different adaptive ways to generate sparsity 
patterns of M's. They all need to drop small entries to get sparse M's. Recently, Jia 
and Zhu [25] have proposed a Power Sparse Approximate Inverse (PSAI) procedure that 
determines the sparsity pattern of M in a new adaptive way. Furthermore, they have 
developed a practical PSAI algorithm with dropping, called PSAI(ioZ), that dynamically 



drops the entries in M whose magnitudes are smaller than a prescribed tolerance tol during 
the process. They have conducted extensive numerical experiments, demonstrating that 
the PSAI(ioZ) algorithm is at least comparable to and for some real-world problems is 
considerably more effective than the SPAI algorithm |19j . 

As we have seen, one generally has to use dropping strategies to make M sparse so as 
to reduce the construction or/and application cost of M when it is determined statically 
or dynamically. Apparently, dropping is a key step and plays a central role in designing 
a robust and effective SAI preconditioning procedure. Chow [13] suggests a prefiltration 
strategy and drops the entries of A itself that are below some tolerance before determining 
the pattern of M. This prefiltration idea is also adopted in, e.g., [261 [271 [33] . Instead of 
prefiltration, it may be more effective to apply the sparsification to M after it has been 
computed, which is called postfiltration; see, e.g., [TJl [M]. Wang and Zhang [35] have 
proposed a multistep static SAI preconditioning procedure that uses both preliftration 
and postfiltration. Obviously, for a static SAI procedure, postfiltration cannot reduce the 
construction cost of M; rather, it only reduces the application cost of M at each iteration. 
For an adaptive SAI procedure, a more effective approach is to dynamically drop small 
entries immediately once they are generated as the construction process proceeds. This 
approach is more appealing as it makes M's sparse throughout the whole construction 
process and may save the computational cost of M very considerably. For sparsification 
applied to FSAI, we refer the reader to [81 171 IT71 128] . 

In this paper, we are concerned with dropping tolerance strategies applied to the adap- 
tive PSAI procedure and all the static F-norm minimization based SAI procedures. We 
have noticed that the dropping tolerances used in the literature are heuristic and empirical. 
One simply takes some small quantities, say like 10~ 3 , as dropping tolerances. Neverthe- 
less, the mechanism for dropping tolerances is by no means so simple. Empirically chosen 
tolerances are not robust and may be at risk, and they are prone to the ineffectiveness 
and even failure of preconditioning. Obviously, an improperly chosen big tolerance may 
produce a sparser but less ineffective M; on the other hand, a too small tolerance may 
result in a possibly much denser M that is much more time consuming to construct or/and 
apply though it may be a good approximate inverse. Our experiments will confirm these 
and illustrate that simply taking seemingly small quantities as dropping tolerance, as done 
in the literature, may produce (numerically) singular M's, causing Krylov solvers to fail 
completely. Therefore, selection criteria for dropping tolerances deserve enough atten- 
tion and it is definitely desirable to establish a rigorous mathematical theory that can 
reveal intrinsic relationships between the tolerances and the quality and effectiveness of 
M, enabling us to design robust and effective SAI preconditioning procedures. 

We point out that dropping has been extensively used in other important precondi- 
tioning techniques such as ILU factorizations [144 131 j . Some effective selection criteria 
have been proposed for dropping tolerances in, e.g., [9^, [20], [30] . 

The goal of this paper is to make an in-depth analysis of selection criteria for dropping 
tolerances used in PSAI. We will establish a rigorous and solid theory on it. We prove 
that the non-singularity and quality of M obtained by PSAI strongly depends on dropping 
tolerances and can be very sensitive to them. Using our theory, one can select dropping 
tolerances to guarantee the non-singularity of M and make it as sparse as possible and in 
the meantime can make M have a similar quality as an approximate inverse to the possibly 
much denser one obtained by PSAI without dropping. Our theory on the selection criteria 
for dropping tolerances is easily adapted to all the static F-norm minimization based SAI 
procedures. The new criteria can be applied to run postfiltration reliably and robustly for 



M obtained by a static SAI procedure, making M and its sparsification have comparable 
preconditioning quality. We make numerical experiments to illustrate our theory and 
show that the quality and effectiveness of M critically depends on and can be sensitive 
to dropping tolerances. Particularly, we report the examples that demonstrate that (i) 
smaller tolerances are not necessary since they either make M possibly much denser or 
cannot drop more entries and meanwhile make its construction more time consuming 
but no essential improvements on the quality of M and (ii) larger but seemingly small 
tolerances can produce (numerically) singular M's, making preconditioning fail completely. 
The paper is organized as follows. In Section 2, we review the basic PSAI (BPSAI) 
procedure and the PSAI(foZ) procedure with dropping [25]. In Section 3, we present a 
few results and based on them establish robust selection criteria for dropping tolerances. 
In Section 4, we test PSAI(ioZ) on a wide range of real world problems, justifying our 
theory and illustrating robustness and effectiveness of our selection criteria for dropping 
tolerances. We also test the three static F-norm minimization based SAI procedures with 
the patterns of (/ + A) k , (I + \A\ + |^4 T |) fc and (A T A) k A T for a given small integer k and 
illustrate the effectiveness of our selection criteria for dropping tolerances. Finally, we end 
up the paper with some concluding remarks in Section 5. 

2 PSAI algorithms 

The BPSAI procedure is based on F-norm minimization and determines the sparsity pat- 
tern of M adaptively during the process. According to the Cayley-Hamilton theorem, 
A^ 1 can be expressed as a matrix polynomial of A of degree m — 1 with m < n: 

m—l 

A' 1 = Y, CiA (2) 

i=0 

with A = I, the identity matrix, and q, i = 0, 1, . . . , m — 1, being certain constants. 

Following [25] . for i = 0,1,..., m— 1, we shall use denote by A l (j,k) the entry of 
A in position (j, k), where j,k = l,2,...,n, and set J>f = {j\A-(j, k) / 0}. For / = 

0, 1, . . . ,Z max , define J? k = U l i=0a f k . Let M = [mi, m2, . . . , m n ] be an approximate 
inverse of A. BPSAI computes each m^, 1 < k < n, by solving the LS problem 

min p(-,^ fc )m fc (^ fc )-e fc || 2 , / = 0, 1, . . . ,/ max , (3) 

n»h(/i*) 

where || • H2 is the vector 2- norm and the matrix spectral norm and e^ is the kth. column of 
the nxn identity matrix /. We exit and output m^ when the minimum in is less than 
a prescribed tolerance e or I exceeds / max - We comment that nifc( c ^^_ 1 ) can be updated 

from the available m/ c (^ fc ) very efficiently; see |25j for details. The BPSAI procedure is 
summarized as Algorithm [TJ in which a. k denotes the fcth column of A and a^ = e& . It is 
easily justified that if / max steps are performed then the sparsity pattern of M is contained 
in that of (I + A) lmax . 

It is shown in [25} Theorem 1] that if A is sparse irregularly, that is, there is at 
least one column of A whose number of nonzero entries is considerably more than the 
average number of nonzero entries per column, then M may become dense very quickly 
as I increases. However, when most entries of A~ l are small, those of M are expected 
to be small too and thus contribute very little to A" 1 . Therefore, in order to control 



Algorithm 1 The BPSAI procedure 



For k = 1,2, . . . ,n, compute m^: 

1. Set rrifc = 0, 1 = 0, a? = e& and take J?q = {k} as the initial sparsity pattern of m^. 
Choose an accuracy requirement e and the maximum Z max of outer loops. 

2. Solve ([3]) for m^ and let r^ = ^4rrifc — e^. 

3. while ||rfe||2 > e and I < Z max do 

4. a fc + = Aa l k , and augment the set J?il_\ by bringing in the indices of the nonzero 



entries in a fe . 
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M-l \ <fi ■ 
if J = then 

Set Z = / + 1, and go to 3; 
end if 

4 = ^ui 



Solve (H|) for updating m^ and r^ = Am^ — e&. 
If ||ffc|| < £j then break. 
Set Z = I + 1 
end while 



the sparsity of M and construct an effective preconditioner, we should apply dropping 
strategies to BPSAI. PSAI(ioZ) just serves this purpose. It aims to effectively determine 
an approximate sparsity pattern of A" 1 and capture its large entries. At each while-loop 
in PSAI(toZ), for the new available m^, small entries below a prescribed tolerance tol 
are dropped and only large ones are retained. We describe the PSAI(ioZ) algorithm as 
Algorithm [21 in which the sparsity pattern of m^ is denoted by yf, I = 0, 1, ... , l ma , K , 
which are updated according to steps 9-11 of Algorithm 2. Hence, for every k, we solve 
the LS problem 

min \\A(;S> l k )m k (S>f)-e k \\ 2 , / = 0, 1, . . . , Z max . (4) 

Similar to BPSAI, m^^A^) can be updated from the available mfc(^ fc ) very efficiently 
From now on we denote by M the preconditioners generated by either BPSAI or 
PSAI(ioZ). How M is computed will be clear from the context. We will distinguish them 
by M and Mj, respectively when necessary. It is clear that we cannot guarantee the non- 
singularity of M if e is not suitably small. For a general matrix M, define r^ = Am^ — e& 
and p = maxi<fc< n {nn2(rfc)}, where nnz(rj.) is the number of nonzero entries in r^. Grote 
and Huckle [19] have proved that if ||rfc||2 < £ and ^Jp~£ < 1 then M is nonsingular. This 
gives a criterion for choosing e. As commented in [19], for SPAI, p is is usually much smaller 
than n. This is also the case for BPSAI. However, we should realize that such a sufficient 
condition is very conservative, as pointed out in [19]. In practice, for a rather mildly 
small £, say 0.3, M is rarely singular. Grote and Huckle [19] have estimated differences 
between M and A -1 in the spectral norm, the F-norm and the 1-norm. Their results 
are general and thus apply directly to BPSAI and PS AI (tol). So for BPSAI, the non- 
singularity and quality of M is determined by e and Z maX ) two parameters that control 
while- loop termination in Algorithm [T] On the one hand, a smaller e will give rise to 
higher quality but generally denser preconditioner M. As a result, more computational 
cost and memory consumption is needed to construct M. It is also more expensive to 
apply M at each iteration of a Krylov solver. On the other hand, a bigger e may generate 



Algorithm 2 The PSAI(ioZ) Algorithm 



For k = 1,2, ... ,n, compute m^: 

1. Set rrifc = 0, Z = 0, a^ = e& and =5q = ^ fc = {k} as the initial sparsity pattern of m^. 
Choose an accuracy requirement e, dropping tolerance tol and the maximum Z max of 
outer loops. 

2. Solve (J3J) for m^ and let r^ = Am/; — e^.. 

3. while ||rfc||2 > e and Z < Z max do 

4. a fc = j4a^,, and augment the set Sl+i by bringing in the indices of the nonzero 
entries in ai + . 



5- J = s& x \ St 
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7 
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if J = then 

Set Z = Z + 1, and go to 3; 
end if 

Solve @ for m^ and compute r^ = imj, — e^. If ||rfc|| < e, perform 11 and break. 
Drop the small entries in m^ whose sizes are below tol and delete the corresponding 
indices from S"h\- 



12. Set Z = Z + 1 

13. end while 



a sparser but possibly poorer M, so that the Krylov solver may use more iterations to solve 
the preconditioned system and the preconditioner M may be ineffective. Unfortunately, 
selection of e can only be empirical. As done usually in the literature, we will simply take 
it to be some fairly small quantities, say 0.2 ~ 0.4, in numerical experiments. 

3 Selection criteria for dropping tolerances 

First of all, we should keep in mind that all SAI preconditioning procedures are based on 
the basic hypothesis that the majority of entries of A -1 is small, that is, there exist sparse 
approximate inverses of A. Mathematically, this amounts to supposing that there exists 
(at least) an M such that the residual \\AM — 1\\ < e for a fairly small e and some matrix 
norm || • ||, where M has many small entries. The size of e measures the quality of M as 
an approximation to A" 1 . The smaller it is, the better quality M has. 

In the following discussions, we will assume that BPSAI produces an M satisfying 
\\AM — I\\ < e for some given small e. We comment that this is definitely achieved for 
a suitable Z max . Under this assumption, keep in mind that M may be relatively dense 
but have many small entries. PSAI(toZ) aims at dynamically dropping those small entries 
below some given dropping tolerance tol as the process proceeds and computing a new 
sparser M, so as to reduce storage memory and computational cost of constructing and 
applying M as a preconditioner. We are concerned with two problems. The first problem 
is how to select tol to make M nonsingular. As will be seen, since tol varies dynamically 
for each k, 1 < k < n, as Z increases from to Z max in Algorithm [21 we will instead denote 
it by tolk when computing the Zcth column m^. The second is how to select tol^s to make 
M as sparse as possible and in the meantime its approximation quality comparable to 
that obtained by BPSAI in the sense that the residuals of two M's are of the same order 
size. With such sparser M, it is expected that Krylov solvers applied to the linear systems 



preconditioned by BPASI and PSAI(toZ), respectively, will use comparable iterations to 
achieve the convergence. If so, PSAI(toZ) will be considerably more effective than BPSAI 
provided that M obtained by PSAI(ioZ) is considerably sparser than that by BPASI. 
As far as we are aware of, these important problems have never been studied rigorously 
and systematically in the context of SAI preconditioning. How to solve them and select 
tolk, k = 1,2, ... ,n satisfying these two requirements is very significant and turns out 
nontrivial. 

Over the years, dropping has been performed empirically in the literature. One com- 
monly considers a dropping tolerance tol in the absolute sense: Set rrijk to be zero if their 
magnitudes are below some empirical tol, say 10 -3 ; see, e.g. [151 EHJ EH ES] . Due to the 
absence of solid theory, doing so is problematic and may either miss important entries or 
retain too many superfluous ones if toVs are chosen improperly. 

For general purposes, we should take the size of m/% itself into account when dropping 
those small entries rrijk for 1 < j < n in m^. Define the vector f^ to be the n-dimensional 
vector whose nonzero entries fjj- = rrijk are those to be dropped in mj. Precisely, we drop 
these rrijk's in m^ when 

If II 
71 — -T < Mfci k = l,2,...,n (5) 

ll m fc|| 

for some suitable norm || • ||, where ti^ is a certain small quantity that should be chosen 
carefully based on some solid theory. For carefully chosen //fc's, only using criterion (JSJ) 
can we drop the truly relatively small entries in m^ and avoid dropping those significant 
entries when ||mfc|| is small or retaining those insignificant entries when ||rrifc|| is large. 

Next we establish a few results that play a vital role in selecting iXk in © effectively. 
Unless stated otherwise, the matrix norm || • || denotes a general induced matrix norm, 
which includes the 1-norm, the infinity norm and the spectral norm. 

Theorem 1. Assume that \\AM — I\\ < e < 1 for any induced matrix norm \\ ■ \\. Then 
M is nonsingular. Define M^ = M — F. If F satisfies 

11*11 < W' ( 6 ) 

then Md is nonsingular. 

Proof. Suppose that M is singular and let w with ||w|| = 1 be an eigenvector associated 
with its zero eigenvalue(s), i.e., Mw = 0. Then for any induced matrix norm we have 

\\AM - I\\ > || (A/If - J)w|| = ||w|| = 1, 

a contradiction to the assumption that ||AM — 1\\ < 1. So M is nonsingular. 
Since 

M d = M-F = M(I-M~ l F), (7) 



from (El) we have 



II /Vf-i|| 
IM" 1 ^!! < IIM-^IIIFII < (1-e) " „ m • (8) 



Since 
we get 



|A|| - HM" 1 !!! < \\A - M _1 || < \\AM - IIHIM" 1 !! < e\\M~ 1 \ 
(l-^IlM" 1 !!^ \\A\\<{1 + e)\\M- 1 \\, 



which means 

Substituting Q into ©, we have 

||m _1 .f|| < 1, (10) 

from which it follows that I — M~~ 1 F in ([7|) is nonsingular and so is M^. □ 

Now denote by M& the sparse approximate inverse of A obtained by PSAI(toZ). Then 
Mfi aims to retain the large entries rriij of M and drops those small entries rriij in it. The 
small entries that are dropped, if any, are those nonzero entries in the matrix F. So, M^ is 
sparser than M, and the number of its nonzero entries is equal to that of M minus that of 
F. This theorem shows that for given e and / max > we should select tolj, for k = 1,2, ... , re 
such that ([6]) holds in order to ensure the non-singularity of Mj, provided that we get a 
nonsingular M from BPSAI; otherwise, M^ may be singular. 

In order to get an M^ comparable to M as an approximation to A— 1, we need to 
impose further restrictions to F and e, as indicated below. 

Theorem 2. Assume that \\AM — 1\\ < e < 1. Then M is nonsingular. Let M^ = M — F. 
Then if 

||F||<mm<!— ,^^>>, (11) 

Mrf is nonsingular and 

Specifically, if e < 0.5, then 



AM\ Mil . 

\\AM d - I\\ < min{l, 2e}. (12) 

11*11 < t4tt • ( 13 ) 



and 



|| AM d - I|| < 2e. (14) 

Proof. The non-singularity of M is already proved in Theorem [TJ Since i 7 satisfying 
pip must satisfies © , the non-singularity of M^ follows from Theorem [1] directly. From 
\\AM -I\\ <£ and (fTTjl. we obtain 

||AM" d -I|| = ||M- AF-J|| < PM-/|| + M||||F|| 
< e + min{e, 1 — e} = min{l, 2e}. 

(JX3J) and (fHjl are direct from (fTT|) and (JT2J) , respectively. D 



In what follows we always assume that e < 0.5, so that (|13p is satisfied and the residual 
\\AM d -I\\ < 1. 

It is known that M is a good approximation to A" 1 for a small e. This theorem tells us 
that if dropping tolerances tola's make F satisfy (|13|) then the M d and M have comparable 
residuals and are approximate inverses of A with comparable accuracy, provided that e 
is fairly small. In this case, we claim that they possess similar preconditioning quality 
for a Krylov solver and it is expected that the Krylov solver preconditioned by M d and 
M, respectively, use comparable iterations to achieve convergence, as will be confirmed by 
numerical experiments. 

We now present a theorem under the assumption that M m fc — e^|| < e for k = 
1,2,..., n, which is just used as the stopping criterion in the PSAI and SPAI algorithms, 
etc. where the norm is the 2-norm. 



Theorem 3. Let M = [mi, rri2, . . . , m„] satisfy \\Am k — e k \\ < e < 0.5 for k = 1, 2, . . . , n 



and let M d = M-F = [mf, m%, . . . , m d n ] with F = [/ l5 / 2 , . . . ,/J. // ||/ fc [| < ^, k 



1,2, ... ,n, then 

\\Am d k - e k \\<2e. (15) 

Proof. Let r^ = Ara^ — e k . Then from m^ = m^ — f^ we get 

\\An4 - e k \\ = ||r* - Af k \\ < \\r k \\ + \\Af k \\ 
<s+\\Af k \\ <£+p||||f fe ||, 

from which and the assumption it follows that (|15|) holds. D 

This theorem indicates that given e < 0.5 and l max , if the while-loop in BPSAI ter- 
minates due to ||rfc|| < e for all k and tol k % are selected such that ()13p holds, then the 
corresponding columns of M^ and M are of the same order accuracy provided that e is 
fairly small, so M d and M are approximate inverses of A with comparable accuracy and 
are expected to have a similar preconditioning quality. 

Theorems [2H21 are fundamental and relate the quality of M^ to that of M in terms of e 
quantitatively and explicitly. They provide necessary ingredients for reasonably selecting 
//fe's to get a possibly much sparser preconditioner M& that has a similar preconditioning 
quality to M. In what follows we present a detailed analysis and propose robust and 
effective selection criteria for dropping tolerances tol k s. 

For given l m3X , suppose that BPSAI has already computed a nonsingular M that 
satisfies ||^4mfc — e k \\ < e < 0.5 for k = 1,2, ... ,n. The crucial point is to combine ([5]) 
with Theorems [2H3] in an organic and correct way. Based on ([5]) and (|13p . we require that 



||ffe|| < min^ fc ||m fc ||,— - ^ (16) 

at every loop in PSAI(toZ). 

However, (|16p itself does not tell us how to select \i k and how to drop each small entry 
rrij k in m^. We need to give an insightful analysis on it. Notice that Theorems [TH3] hold 
for any induced matrix norm. For our purpose, it appears that the 1-norm || • ||i is a 
reasonable and practical choice since this norm enables us to unify ([5]) and the condition 
(|13p both elegantly and practically by noting maxi</ c < Tl ||ffc||i = ||-f||l- The spectral norm 
|| • || 2 is theoretically useable but it is expensive to compute \\A\\2- However, the infinity 
norm || • ||oo does not nicely fit into our use. The reason is that there is no elegant and 
sharp relationship between ||ffc||oo in © and H-^Hoo in the condition (|13l) . 

We first investigate the issue of how to choose ffc to make (fT6|) hold. It suffices to drop 
fnjki 1 < J < n in rrifc as long as its size is no more than min{^||mfc||i, ,, f,, } divided by 
nnz(f k ). Since nnz({ k ) is not known a-priori, in practice we replace it by the currently 
available nnz(m k ) before dropping, which is an upper bound for nnz(f k ). Therefore, we 
can take adaptive dropping tolerances tol k s for entries mj k s as 

mm|// fe ||m jfc ||i, 1 rf[T-} 

tol k = ^ ^^-, k = l,2,...,n. (17) 

nnz(m k ) 

Next we go to our central concern and discuss how to relate fi k and tol k to e. The 
two terms in the numerator of tol k (|17p have their own rights and play different roles, 



and consequently no one can be ignored. //fc||mfc||i itself is to drop small entries in M 
while I, fn is to guarantee the non-singularity of M and to make it have comparable 
preconditioning quality to M. Only considering the first term, as our experiments confirm 
in the next section, may deliver a poor and even (numerically) singular M^ if 

// fc mjfe i > -rr-rrr- and tolk = -, r (18) 

\\A\\i nnz{uik) 

for some k, causing the ineffectiveness and even complete failure of preconditioning. On 
the other hand, if 

II II - £ J 4. l Mifc|| m fc||l , , o / 1n x 

fi k m fc i < -rr-r fp and tol k = -, ft = 1,2, ...,n, (19) 

||A||i nnzyvdk) 

the sparsity pattern of M^ is determined by fj,/-. In this case, however, we must realize that 
the smaller //& is, the less sparse M^ is, but such //& does not improve the quality of M^ 
as a sparse approximate inverse. As a result, we must combine the two terms organically. 
As the above arguments show, the best and the most effective \i\. should make the two 
terms equal, that is, we should choose 



and 



ll m fc||i||A||i 

t0lk = 1 \\\A\\ ■ W 

nnz{m.k)\\A\\i 

Only in this way can we get an Mj that is as sparse as possible and has a similar precon- 
ditioning quality to M, thus making the construction and application of M^ as cheap as 
possible. We should note that \ik and tolk are varying parameters during the while-loop in 
Algorithm 2 as nnz(m.k) changes when I increases from to / max in the PSAI algorithms. 

We comment that in the literature one usually uses a fixed dropping tolerances /ij. and 
tolk, which are chosen empirically and heuristically without taking e into consideration; 
see, e.g., [15 1 125 1 [M l I35j. Our theory has indicated that the non-singularity of Mj and its 
preconditioning quality critically depend on selection of dropping tolerances and can be 
sensitive to them. If a dropping tolerance tol is fixed and is larger than tolk in ()2ip for some 
k during the construction process of Md, we have experimentally found that M^'s obtained 
by PSAI(ioZ) can be exactly singular in finite arithmetic, resulting in complete failures of 
Krylov solvers. However, reducing tol by one order of magnitude yielded nonsingular 
and high quality M^'s. These results demonstrate that improperly chosen small tolk's 
can indeed make preconditioning fail completely and the quality and effectiveness of M^ 
depends on tolk's crucially. 

Since Theorems HH21 hold for a generally given approximate inverse M of A and do 
not depend on a specific F-norm minimization based SAI preconditioning procedure, our 
theory works for all the static F-norm minimization based SAI preconditioning techniques. 
The difference and simplification is that, for a static SAI procedure, //& in (|2ip and tolk 
are fixed for each k as m^ and nnz(mk) are already determined a-priori before dropping 
is performed on M. Practically, after computing M by a static SAI procedure, we record 
1 1 -Am*; — e^ll = £fc, k = 1,2, . . . ,n and compute the constants nnz(mk) for k = 1,2, ... ,n. 
Then by (|TT|) and (|2"T|) we drop rrijk whenever 



rarcz(mfc)||A||i : 



rrijk |< tolk = ,_ mi ,,1, , 3 = 1.2, ... ,n. (22) 
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In such a way, we get a new sparser approximate inverse Mj that has a similar precon- 
ditioning quality to the possibly much denser M obtained by the static SAI procedure 
without dropping. However, we should remind that, different from the adaptive PSAI(toZ) 
where small entries below tola's are dropped immediately once they are generated during 
the while-loop of Algorithm 2, static SAI procedures perform sparsification only after they 
compute possibly much denser M's than M/s. So static SAI preconditioning does not 
save the computational cost of M^'s. 

4 Numerical experiments 

In this section we consider a number of real world problems coming from scientific and 
industrial applications, which are described in Table [3JJ. We shall demonstrate the effec- 
tiveness of our selection criteria for dropping tolerances applied to PSAI(ioZ) and three 
F-norm minimization based static SAI preconditioning procedures. 

The numerical experiments are performed on an Intel (R) Core (TM)2Duo Quad CPU 
E8400 @ 3.00GHz processor with main memory 2 GB using Matlab 7.8.0 with the machine 
precision e mac h = 2.22 x 10 -16 under the Linux operating system. We run all the algorithms 
in a sequential environment. It is observed that the construction of M's is often very 
costly and dominates the whole cost. But it is amenable to parallelization of F-norm 
minimization based SAI procedures in a distributed parallel environment. Preconditioning 
is from the right except pores_2, for which we found that left preconditioning outperforms 
right one very considerably in our test. It appears that the rows of pores_2's inverse can 
be approximated more effectively than its columns by PSAI(ioZ). Krylov solvers employed 
are BiCGSTAB and the restarted GMRES(50) algorithms [I], and we use the codes from 
Matlab 7.8.0. We comment that if the output of iterations for the code bicgstab.m is k, the 
dimension of Krylov subspace is 2k and BiCGSTAB performs 2k matrix-vector products. 
The initial guess is always xq = 0, and the right-hand side b is formed by choosing the 
solution x = [1, 1, ... , 1] T . The stopping criterion is 

\\0 J\$rn 9 Q , •, 

i! „,„ m " 2 < 10~ 8 , x m = My m , (23) 



where y m is the approximate solution obtained by BiCGSTAB or GMRES(50) applied to 
the preconditioned linear system AMy = b. 

In the experiments, we take different e's to control the quality of M and we always set 
i max = 5. Keep in mind that every while-loop of the PSAI(ioZ) algorithms terminates when 
1 1 Am/;— ejt ||a < sot I > / max - In all the tables, nnz denotes the number of nonzeros of a test 
matrix; iter_b and iter_g stand for the iteration numbers of BiCGSTAB and GMRES(50), 
respectively; spar = n ™nz(A\ denotes the sparsity of M relative to A; ptime and stime 
are CPU timings (in second) of constructing M and of solving the preconditioned linear 
systems by iterative solvers, respectively, f indicates that convergence is not attained 
within 1000 iterations, num- denotes the CPU time no more than num. 

We test all the matrices listed in Table [1] and report the results in Tables [2H9J Before 
explaining the tables in details, we point out an important observation: Given Z max = 5, 
for every e = 0.2, 0.3, 0.4, we found that M's and M d 's obtained by BPSAI and PSAI(io0 



1 All of these matrices are from the Matrix Market of the National Institute of Standards and Technol- 
ogy at http://math.nist.gov/MatrixMarket or from the University of Florida Sparse Matrix Collection at 
http://www.cise.un.edu/research/sparse/matrices/. 
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Table 1: The description of test matrices. 



Matrix 



Description 



epbl 


14734 


95053 


fidap024 


2283 


48733 


fidap028 


2603 


77653 


fidap031 


3909 


115299 


fidap036 


3079 


53851 


nos3 


960 


8402 


nos6 


675 


1965 


orsrcg_l 


2205 


14133 


orsirr_l 


1030 


6858 


orsirr_2 


886 


5970 


pores_2 


1224 


9613 


sherma.nl 


1000 


3750 


sherman2 


1080 


23094 


shcrman3 


5005 


20033 


shcrman4 


1104 


3786 


shcrman5 


3312 


20793 



Plate-fin heat exchanger 

Computational fluid dynamics problem 

Computational fluid dynamics problem 

Computational fluid dynamics problem 

Computational fluid dynamics problem 

Biharmonic equation 

Poisson equation 

Oil reservoir simulation. Jacobian Matrix 

As ORSREG1, but unnecessary cells coalesced 

As ORSIRR1 , with further coarsening of grid 

Reservoir simulation 

Oil reservoir simulation 10 x 10 x lOgrid 

Oil reservoir simulation 6x6x5 grid 

Oil reservoir simulation 35 x 11 x 13 grid 

Oil reservoir simulation 16 x 23 x 3 grid 

Oil reservoir simulation 16 x 23 x 3 grid 



with tolk defined by (|2ip or smaller indeed satisfy H^irifc — efc||2 < £ an d ||vlm^ — ek\\2 < 2e 
for almost all k = 1,2, . . . , n, respectively. The same findings also go to the three static 
SAIs algorithms tested. 



4.1 Results for PSAI(ioZ) 

We shall illustrate that our selection criterion (|22p for dropping tolerances are robust 
and effective for general purpose. After this, we present more test results for three more 
purposes. First, we show that if tolk < „„, (r J \ \\ a \\ then we may get a much denser 

TlTlZyilLfc) ||/i|| i 

Md- So smaller tol^s do not improve the quality of M^ essentially and rather causes 
the construction (and application) of Mj much more time consuming. Second, we show 
that smaller tol^s may not drop more entries for some examples. This also indicates 
that our criterion just defines correct dropping tolerances. Third, we show that if we set 
tolk > nnz(m.\ \\ A \\ a ^ some steps in the while-loop of Algorithm 2 then we may obtain a 
(numerically) singular Mj, causing the complete failure of preconditioning. 

We summarize the results obtained by the two Krylov solvers with and without 
PSAI(to/) preconditioning in TableEJ where we take tolk i n (EU) as dropping tolerances and 
e = 0.3. M = I means no preconditioning used. It is easy to see that two Krylov solvers 
are accelerated by PSAI(toZ) preconditioning substantially, since the iteration numbers are 
reduced considerably, especially for sherman2. We observed that the whole solving time 
is dominated by ptime of constructing M and stime is negligible in the serial computing 
environment. So we do not list stime here. 

Next we report experiments on three matrices orsirr_l, orsirr_2 and orsreg_l by us- 
ing PSAI(toZ) with e = 0.2 and a few tol^s smaller than that defined by (|2ip as drop- 
ping tolerances. Denoting by RHS the right hand side of (|2ip . we take tol^s to be 
RHS, RHS/100, RHS/1000 and RH 5/10000, respectively, and look into the effects of 
smaller tolkS on the quality and sparsity of M's as well as computational costs. Table 
reports the results, where tolk = recovers the generic BPSAI algorithm. First, it is seen 
that the two Krylov solvers preconditioned by PSAI(toZ) with four given to/^'s and BPSAI 
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Table 2: Convergence results for all the test problems: unpreconditioning (M = I) and 
PSAI(ioZ) procedure with e = 0.3, Z max = 5. Note: when the iterations for BiCGSTAB are 
k, the dimension of Krylov subspace is 2k. 





M = I 


PSAI(toZ) 


Matrix 


iter_b 


iter^g 


spar 


ptime 


iter_b 


iter^g 


epbl 


433 


t 


1.17 


40.18 


170 


408 


fida P 024 


t 


t 


5.15 


34.86 


49 


99 


fidap028 


t 


t 


5.40 


109.33 


66 


199 


fida P 031 


t 


t 


2.95 


48.41 


103 


410 


fidap036 


t 


t 


2.48 


12.28 


69 


116 


nos3 


213 


t 


1.63 


1.34 


72 


140 


nos6 


t 


t 


0.94 


0.22 


36 


37 


orsirr_l 


t 


t 


2.02 


2.62 


30 


49 


orsirr_2 


t 


t 


2.44 


2.35 


28 


47 


orsreg_l 


687 


346 


1.17 


5.87 


48 


76 


pores_2 


t 


t 


7.40 


5.41 


39 


53 


shermanl 


356 


t 


2.67 


0.83 


25 


41 


sherman2 


t 


t 


2.76 


4.70 


4 


7 


sherman3 


t 


t 


1.86 


4.94 


144 


866 


sherman4 


101 


377 


1.25 


0.39 


36 


49 


sherman5 


t 


t 


1.77 


2.54 


33 


46 



converged very fast and solved the problems very successfully, as shown by iter_Vs and 
iter-^'s. Second, it is observed that M's obtained by PSAI(foZ) with tolk satisfying (|2ip 
are naturally the sparsest and they are much sparser than the corresponding counterparts 
obtained by the other three smaller tol^s, while M's obtained by BPSAI are densest. For 
these three problems, M's become denser and denser quickly as tolk decreases. We see it 
is the cheapest to construct M's with tolf, = RHS and their preconditioning quality is 
comparable to those obtained by BPSAI. As a result, smaller tol^s do not improve the 
quality of M's essentially and rather make M's much denser and more time consuming to 
construct. 

For all the other test problems in Table Q] except orsirr_l, orsirr_2 and orsreg_l, we have 
made numerical experiments in the same way as described above. We have found that 
the sparsity and preconditioning quality of M's obtained by PSAI(toZ) with four tol^s 
changes little. This means that our dropping criterion (|2ip exactly drops small entries in 
M's perfectly and smaller tol^s do not help any. Together with Table [3l we can well claim 
that our dropping criteria are effective and robust and it is not necessary to take smaller 
tol k 's in PSAI(ioZ). 

We now test the behavior of M obtained by PSAI(toZ) for improperly chosen dropping 
tolerances tolk's that seem small intuitively. We attempt to show that the larger tol^s 
satisfying (|18p for some given seemingly small /i^'s may produce (numerically) singular 
M's. That is, we take fik such that 



toh 



/ifc||m fe ||i 



> 



nnz(rrifc) nn^(nife)||A||i ' 
i.e., (|18p . at some steps in the while-loop of Algorithm 2, where the above lower bound is 
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Table 3: Effects of smaller tolk for PSAI(toZ) with e = 0.2, Z max = 5. Note: when the 
iterations for BiCGSTAB are k, the dimension of Krylov subspace is 2k. 







tol k = RHS 


. , RHS 
tol k 100 


in] HHS 
tUtfe 1000 


in] RHH 
LUlk 10000 


tol k = 




spar 


3.21 


8.75 


11.01 


12.95 


16.21 


orsirr_l 


ptime 
iter -b, iter -g 


3.82 
25,40 


5.58 
17, 27 


6.27 
15, 26 


6.95 

15, 26 


5.90 
15, 26 




stime(xQ.l) 


0.3, 0.6 


0.1, 0.3 


0.1, 0.3 


0.3, 0.4 


0.2, 0.3 




spar 


3.67 


9.27 


11.53 


13.24 


16.04 


orsirr_2 


ptime 
iter -b, iter _g 


3.37 
26, 39 


4.79 
16, 26 


5.54 
16, 25 


6.02 

15, 25 


4.63 
16, 25 




stime(xO.l) 


0.2, 0.5 


0.1, 0.4 


0.2, 0.2 


0.1, 0.2 


0.1, 0.3 




spar 


1.72 


6.20 


8.99 


11.07 


14.77 


orsreg_l 


ptime 
iter_b, iter^g 


8.10 
43, 69 


11.13 
20, 32 


14.07 
17, 30 


17.27 
17, 30 


16.84 
17, 30 




stime(xO.l) 


0.6, 1.0 


0.3, 0.8 


0.5, 0.7 


0.5, 0.6 


0.3, 0.8 



Table 4: Sensitivity of the quality of M's to dropping tolerances tol k s 





nos6 


sherman2 


£ = 0.2 


li k = io- 4 


Hk = io- 3 


£ = 0.3 


n k = io- 4 


Vk = io- 3 


£ = 0.4 


fi k = io- 3 


fi k = w- :i 



(a): Bad /i^s resulting in (numerically) singular M's by PSAI(ioZ). For nos6, 
condest(M) « 10 25 for e = 0.2,0.3, [i k = 10" 4 . For the others, condest(M) = INF. 



Matrix 


£ 


fJ-k 


ptime 


spar 


iter_b 


iter-g 




0.2 


W~ b 


0.64 


2.72 


22 


27 


nos6 


0.3 


10" 5 


0.24 


0.93 


34 


38 




0.4 


10" 4 


0.23 


0.54 


51 


47 




0.2 


10" 5 


6.66 


2.73 


11 


15 


sherman2 


0.3 


10" 5 


4.59 


2.25 


11 


15 




0.4 


10" 5 


3.72 


2.03 


11 


17 



(b): Good /j, k s leading to effective M's 



just our dropping tolerance (|21|) . We drop the entries whose sizes are below such improper 
tol k for some given small fj, k s. Table H] (a) lists the parameters e and fi k that produce 
a (numerically) singular M in the sense of the Matlab function condest(M) = INF or 
extremely large, e.g., around 10 25 . Table H] (b) reports the convergence results that are 
obtained by reducing the each corresponding /i k by one or two orders. Obviously, the one- 
or two-order reduction of [i k results in very essential improvements on the effectiveness of 
the preconditioners M's. It does not only deliver nonsingular M's but also accelerates the 
convergence very considerably. These tests indicate that the non-singularity and quality 
of Mrf obtained PSAI(ioZ) can be very sensitive to tol k . 
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Table 5: Sensitivity of the quality of M's to fixed dropping tolerances tol's 





nos3 


nos6 


orsirr_l 


orsirr_2 


orsreg_l 


sherman5 


tol 


1(T 2 


10~ B 


10~ 5 


10~ 5 


10~ 4 


io- 2 



(a): Bad tol's resulting in numerically singular M's for e = 0.4 



Matrix 


tol 


ptime 


spar 


iter_b 


iter_g 


nos3 


1(T 3 


0.58 


0.16 


122 


845 


nos6 


lO" 7 


0.22 


0.43 


61 


t 


orsirr_l 


10~ 6 


2.03 


0.63 


116 


286 


orsirr_2 


10~ 6 


1.61 


0.68 


78 


229 


orsreg_l 


1(T 5 


7.23 


0.37 


144 


309 


sherman5 


10" 3 


3.18 


0.44 


67 


130 



(b): Good tol's leading to effective M's for e = 0.4. 



Table H] is on dynamically varying tolk's. We have also tested PSAI(ioZ) for several 
fixed tol's during the whole while-loop of Algorithm 2 with e = 0.4. For a given and 
fixed tol, PSAI(ioZ) drops the entries of M whose sizes are below tol. Table [3(a) lists 
the matrices with tol's leading to (numerically) singular M's for e = 0.4. However, if we 
reduce tol by one order, the yielding preconditioners will perform very well; see Table E^b) 
for details. Therefore, we can conclude that the non-singularity and acceleration quality 
of M's can be sensitive to tors. 

In summary, the quality of M is very sensitive to dropping tolerances, and our section 
criterion is fairly robust and effective for PSAI(ioZ). 



4.2 Results for three static SAI procedures 

In this subsection, we test the three static F-norm minimization based SAI preconditioning 
techniques with the patterns of (I + A) 3 , (I + \A\ + \A T \) 3 A T and (AA T ) 2 A T , respectively; 
see [22] for the effectiveness of these patterns. We are intended to show the effectiveness 
of dropping selection criterion (|22p and exhibit the sensitiveness of the preconditioning 
quality of M to the fixed dropping tolerance. We first compute M by predetermining its 
pattern and solving n independent LS problems, and we then get a sparser M^ by dropping 
the small entries in M below the one defined by (|22p or some tolerance given empirically. 

We summarize the results in Tables [SHS where ptime includes the time for predeter- 
mination of the pattern of M, the computation of M and the sparsification of M. Table [6] 
lists the results obtained by the static SAI procedure with the pattern of (/ + ^4) 3 , where 
Table EJa) lists the matrices with tol's leading to (numerically) singular M's and Table 
[6|h) exhibits excellent performance of M^'s generated by the static SAI when reducing 
corresponding tol's in Table (a) by one order. Tables EHS1 show the results obtained by 
the three static SAI procedures with dropping criterion (|22p . 

We observed that there are some columns whose residual norms ||Amj. — ej.|| = £& are 
very small (some are at the level of e mac h). Therefore, to drop small elements as many as 
possible, we replace those e^'s below 0.1 by 0.1 in (|22|) . It is seen that each M^ is much 
sparser than the corresponding M and it is less time consuming to solve the preconditioned 
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Table 6: Sensitivity of the quality of M^'s to fixed toVs for the static SAI procedure with 
the pattern of (/ + A) 3 . Note: when the iterations for BiCGSTAB are k, the dimension 
of Krylov subspace is 2k. 





orsirr_l 


orsirr_2 


orsreg_l 


pores_2 


sherman5 


tol 


1CT 5 


1(T 5 


10~ 3 


10~ e 


10" 2 



(a): Bad toVs resulting in numerically singular M's 



Matrix 


tol 


ptime 


spar 


iter_b 


iter^g 


orsirr_l 


10"° 


1.63 


1.82 


33 


50 


orsirr_2 


1(T 6 


1.31 


2.69 


32 


48 


orsreg_l 


10~ 4 


4.09 


0.91 


45 


74 


pores_2 


10- 7 


3.10 


2.47 


62 


158 


sherman5 


10" 3 


11.79 


1.55 


24 


34 



(b): Good tol's leading to effective M's 



linear systems. However, each M^ and the corresponding M have the same accelerating 
quality, and they make each Krylov solver use exactly the the same iterations when each 
test problem is preconditioned by M^ and M, respectively. These results demonstrate 
that our selection criterion (|22p is very effective and robust. In addition, we notice from 
Table [9] that the pattern of (A A) 2 A leads to considerably denser M and Mj and it is 
much more expensive to compute them, compared with the other two patterns. Therefore, 
this static SAI procedure is less effective than the other two ones. 



5 Conclusions 

Selection criteria for dropping tolerances are vital to SAI preconditioning. However, this 
important problem has received little attention and never been studied rigorously and sys- 
tematically. For F-norm minimization based SAI preconditioning, such criteria affect both 
the non-singularity of a preconditioner M and the quality and effectiveness of it. Improper 
dropping tolerances may produce (numerically) singular M's, causing the complete failure 
of preconditioning. To develop a robust and effective PSAI(toZ) preconditioning proce- 
dure, we have carefully analyzed the effects of dropping tolerances on the non-singularity, 
quality and effectiveness of preconditioners. We have established some important and 
intimate relationships between them. Based on them, we have proposed robust and ef- 
fective selection criteria for dropping tolerances that can make M's as sparse as possible 
and simultaneously have similar accelerating quality to those obtained by BPSAI. The 
theory on selection criteria is of generality and is adapted to all the static F-norm mini- 
mization based SAI preconditioning procedures. Numerical experiments have shown that 
our criteria work very well. 

For general purpose and effectiveness, robust selection criteria for dropping tolerances 
also play a key role in other adaptive F-norm minimization based SAI preconditioning pro- 
cedures, i.e., the SPAI algorithm, that uses dropping. Just like for PSAI(ioZ), dropping 
criteria serve two purposes, one of which is to make an approximate inverse M as sparse 
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Table 7: Static SAI procedure with the pattern of (I + A) 3 . Note: when the iterations for 
BiCGSTAB are k, the dimension of Krylov subspace is 2k. 





ptime 


spar 


iter_ b 


iter_g 


stime-b 


stime-g 


orsirr_l 


M 

M d 


1.78 


8.36 
4.54 


29 
29 


45 
45 


0.03 
0.01 


0.08 
0.06 


orsirr_2 


M 

M d 


1.56 


8.62 
5.24 


30 
30 


44 
44 


0.02 
0.02 


0.04 
0.03 


orsreg_l 


M 

M d 


4.69 


7.53 
2.95 


28 
33 


51 
51 


0.04 
0.01 


0.09 
0.08 


pores_2 


M 
M d 


3.00 


9.25 

4.98 


52 
52 


118 
118 


0.09 
0.06 


0.14 
0.13 


sherman5 


M 

M d 


13.69 


8.39 
3.54 


22 
22 


31 
31 


0.04 
0.02 


0.05 
0.04 



Table 8: Static SAI procedure with the pattern of (J + \A\ + \A T \) 3 A T . Note: when the 
iterations for BiCGSTAB are k, the dimension of Krylov subspace is 2k. 





ptime 


spar 


iter_b 


iter^g 


stime-b 


stime-g 


orsirr_l 


M 
M d 


6.00 


16.41 
10.06 


18 
18 


28 
28 


0.03 
0.01 


0.04 
0.03 


orsirr_2 


M 

M d 


5.11 


17.03 
11.23 


18 
16 


28 
28 


0.02 
0.01 


0.05 
0.02 


orsreg_l 


M 

M d 


15.77 


14.03 
6.83 


19 
19 


34 
34 


0.05 
0.02 


0.06 
0.05 


pores_2 


M 

M d 


13.48 


18.42 
11.91 


26 
26 


38 
38 


0.05 
0.05 


0.07 
0.06 


sherman5 


M 

M d 


54.19 


14.94 
6.41 


16 
16 


23 
23 


0.06 
0.02 


0.06 
0.04 



as possible and the other is to guarantee that its preconditioning quality is comparable 
to that obtained by the SAI preconditioning procedure without dropping. For factorized 
sparse approximate inverse preconditioning, such the AINV algorithm and the FSAI al- 
gorithm, dropping is equally important. For it, the non-singularity of the factorized M 
is guaranteed naturally but how to drop small entries is nontrivial and has not yet been 
studied systematically. All these are very significant and are being under consideration. 
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